背景
最近业务有一个需求,为防止机房级别的故障,想要在异地机房新搭建一套TiDB集群做备用,以便能随时进行机房级别的切换。这种需求当然是要用TiCDC来同步啦,第一要步就是通过br工具进行备份,然后再来同步。
官方文档&FAQ
备份存储的选择
官方文档建议使用S3或者NFS,如果使用local的话,因为br备份是将tikv的各个节点数据保存到本地目录,在恢复的时候需要将所有的tikv节点备份数据合并到一起后才能使用,这样比较麻烦,被官方不推荐使用。
但咱不是没那条件么,合并麻烦是麻烦,但总归是条路子。
https://docs.pingcap.com/zh/tidb/dev/br-use-overview#如何管理备份数据
备份用户的权限和注意项
看FAQ,要求备份的目录要具有读写权限,如果 br 工具和 TiKV 位于不同的机器,则需要用户的 UID 相同。
权限可以理解,但为啥uid也要完全一致?
以下为具体测试步骤。
实验步骤
环境准备
使用三台测试机
dbpnew129v 10.10.10.1
dbpnew130v 10.10.10.2
dbpnew131v 10.10.10.3
查看三台备份用户的uid(为啥用kibana用户,因为我也在测试es。。)
[kibana@dbpnew129v backup]$ id
uid=49480(kibana) gid=49479(kibana) groups=49479(kibana)
[kibana@dbpnew130v ~]$ id
uid=49479(kibana) gid=49479(kibana) groups=49479(kibana)
[kibana@dbpnew131v ~]$ id
uid=49478(kibana) gid=49479(kibana) groups=49479(kibana)
测试tidb版本
[kibana@dbpnew129v backup]$ tiup cluster display test2
tiup is checking updates for component cluster ...
Starting component `cluster`: /home/kibana/.tiup/components/cluster/v1.13.0/tiup-cluster display test2
Cluster type: tidb
Cluster name: test2
Cluster version: v6.5.2
Deploy user: kibana
SSH type: builtin
Dashboard URL: http://10.10.10.1:2379/dashboard
Grafana URL: http://10.10.10.1:3000
ID Role Host Ports OS/Arch Status Data Dir Deploy Dir
-- ---- ---- ----- ------- ------ -------- ----------
10.10.10.1:9093 alertmanager 10.10.10.1 9093/9094 linux/x86_64 Up /data1/tidb-data/alertmanager-9093 /data1/tidb-deploy/alertmanager-9093
10.10.10.1:3000 grafana 10.10.10.1 3000 linux/x86_64 Up - /data1/tidb-deploy/grafana-3000
10.10.10.2:2379 pd 10.10.10.2 2379/2380 linux/x86_64 Up /data1/tidb-data/pd-2379 /data1/tidb-deploy/pd-2379
10.10.10.1:2379 pd 10.10.10.1 2379/2380 linux/x86_64 Up|L|UI /data1/tidb-data/pd-2379 /data1/tidb-deploy/pd-2379
10.10.10.3:2379 pd 10.10.10.3 2379/2380 linux/x86_64 Up /data1/tidb-data/pd-2379 /data1/tidb-deploy/pd-2379
10.10.10.1:9090 prometheus 10.10.10.1 9090/12020 linux/x86_64 Up /data1/tidb-data/prometheus-9090 /data1/tidb-deploy/prometheus-9090
10.10.10.2:4000 tidb 10.10.10.2 4000/10080 linux/x86_64 Up - /data1/tidb-deploy/tidb-4000
10.10.10.1:4000 tidb 10.10.10.1 4000/10080 linux/x86_64 Up - /data1/tidb-deploy/tidb-4000
10.10.10.3:4000 tidb 10.10.10.3 4000/10080 linux/x86_64 Up - /data1/tidb-deploy/tidb-4000
10.10.10.2:20160 tikv 10.10.10.2 20160/20180 linux/x86_64 Up /data1/tidb-data/tikv-20160 /data1/tidb-deploy/tikv-20160
10.10.10.1:20160 tikv 10.10.10.1 20160/20180 linux/x86_64 Up /data1/tidb-data/tikv-20160 /data1/tidb-deploy/tikv-20160
10.10.10.3:20160 tikv 10.10.10.3 20160/20180 linux/x86_64 Up /data1/tidb-data/tikv-20160 /data1/tidb-deploy/tikv-20160
开始备份
[kibana@dbpnew129v data1]$ tiup br backup full --pd 10.10.10.2:2379 --storage "local:///data1/backup"
因为/data1是777权限,而指定的/data1/backup子目录并没有提前创建,于是备份吐出一大堆的错误信息,感受到了满屏的伤害。。。
## 截取部分日志
[2023/09/11 10:55:24.686 +08:00] [INFO] [collector.go:77] ["Full Backup failed summary"] [total-ranges=80] [ranges-succeed=0] [ranges-failed=80] [backup-total-ranges=80] [backup-total-regions=82] [unit-name="range start:7480000000000000485f720000000000000000 end:7480000000000000485f72ffffffffffffffff00"] [error="rpc error: code = Canceled desc = context canceled"] [errorVerbose="rpc error: code = Canceled desc = context canceled\ngithub.com/tikv/pd/client.(*client).respForErr\n\t/go/pkg/mod/github.com/tikv/pd/client@v0.0.0-20230724080549-de985b8e0afc/client.go:1582\ngithub.com/tikv/pd/client.(*client).GetAllStores\n\t/go/pkg/mod/github.com/tikv/pd/client@v0.0.0-20230724080549-de985b8e0afc/client.go:1189\ngithub.com/pingcap/tidb/br/pkg/conn/util.GetAllTiKVStores\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/pkg/conn/util/util.go:39\ngithub.com/pingcap/tidb/br/pkg/conn.GetAllTiKVStoresWithRetry.func1\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/pkg/conn/conn.go:83\ngithub.com/pingcap/tidb/br/pkg/utils.WithRetry\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/pkg/utils/retry.go:56\ngithub.com/pingcap/tidb/br/pkg/conn.GetAllTiKVStoresWithRetry\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/pkg/conn/conn.go:80\ngithub.com/pingcap/tidb/br/pkg/backup.(*Client).BackupRange\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/pkg/backup/client.go:893\ngithub.com/pingcap/tidb/br/pkg/backup.(*Client).BackupRanges.func2\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/pkg/backup/client.go:852\ngithub.com/pingcap/tidb/br/pkg/utils.(*WorkerPool).ApplyOnErrorGroup.func1\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/pkg/utils/worker.go:76\ngolang.org/x/sync/errgroup.(*Group).Go.func1\n\t/go/pkg/mod/golang.org/x/sync@v0.3.0/errgroup/errgroup.go:75\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1598"] [unit-name="range start:7480000000000000185f720000000000000000 end:7480000000000000185f72ffffffffffffffff00"] [error="rpc error: code = Canceled desc = context canceled"] [errorVerbose="rpc error: code = Canceled desc = context canceled\ngithub.com/tikv/pd/client.(*client).respForErr\n\t/go/pkg/mod/github.com/tikv/pd/client@v0.0.0-20230724080549-de985b8e0afc/client.go:1582\ngithub.com/tikv/pd/client.(*client).GetAllStores\n\t/go/pkg/mod/github.com/tikv/pd/client@v0.0.0-20230724080549-de985b8e0afc/client.go:1189\ngithub.com/pingcap/tidb/br/pkg/conn/util.GetAllTiKVStores\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/pkg/conn/util/util.go:39\ngithub.com/pingcap/tidb/br/pkg/conn.GetAllTiKVStoresWithRetry.func1\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/pkg/conn/conn.go:83\ngithub.com/pingcap/tidb/br/pkg/utils.WithRetry\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/pkg/utils/retry.go:56\ngithub.com/pingcap/tidb/br/pkg/conn.GetAllTiKVStoresWithRetry\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/pkg/conn/conn.go:80\ngithub.com/pingcap/tidb/br/pkg/backup.(*Client).BackupRange\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/pkg/backup/client.go:893\ngithub.com/pingcap/tidb/br/pkg/backup.(*Client).BackupRanges.func2\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/pkg/backup/client.go:852\ngithub.com/pingcap/tidb/br/pkg/utils.(*WorkerPool).ApplyOnErrorGroup.func1\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/pkg/utils/worker.go:76\ngolang.org/x/sync/errgroup.(*Group).Go.func1\n\t/go/pkg/mod/golang.org/x/sync@v0.3.0/errgroup/errgroup.go:75\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1598"] [unit-name="range start:748000fffffffffffd5f720000000000000000 end:748000fffffffffffd5f72ffffffffffffffff00"] [error="rpc error: code = Canceled desc = context canceled"] [errorVerbose="rpc error: code = Canceled desc = context canceled\ngithub.com/tikv/pd/client.(*client).respForErr\n\t/go/pkg/mod/github.com/tikv/pd/client@v0.0.0-20230724080549-de985b8e0afc/client.go:1582\ngithub.com/tikv/pd/client.(*client).GetAllStores\n\t/go/pkg/mod/github.com/tikv/pd/client@v0.0.0-20230724080549-de985b8e0afc/client.go:1189\ngithub.com/pingcap/tidb/br/pkg/conn/util.GetAllTiKVStores\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/pkg/conn/util/util.go:39\ngithub.com/pingcap/tidb/br/pkg/conn.GetAllTiKVStoresWithRetry.func1\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/pkg/conn/conn.go:83\ngithub.com/pingcap/tidb/br/pkg/utils.WithRetry\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/pkg/utils/retry.go:56\ngithub.com/pingcap/tidb/br/pkg/conn.GetAllTiKVStoresWithRetry\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/pkg/conn/conn.go:80\ngithub.com/pingcap/tidb/br/pkg/backup.(*Client).BackupRange\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/pkg/backup/client.go:893\ngithub.com/pingcap/tidb/br/pkg/backup.(*Client).BackupRanges.func2\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/pkg/backup/client.go:852\ngithub.com/pingcap/tidb/br/pkg/utils.(*WorkerPool).ApplyOnErrorGroup.func1\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/pkg/utils/worker.go:76\ngolang.org/x/sync/errgroup.(*Group).Go.func1\n\t/go/pkg/mod/golang.org/x/sync@v0.3.0/errgroup/errgroup.go:75\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1598"] [unit-name="range start:7480000000000000205f720000000000000000 end:7480000000000000205f72ffffffffffffffff00"] [error="rpc error: code = Canceled desc = context canceled"] [errorVerbose="rpc error: code = Canceled desc = context canceled\ngithub.com/tikv/pd/client.(*client).respForErr\n\t/go/pkg/mod/github.com/tikv/pd/client@v0.0.0-20230724080549-de985b8e0afc/client.go:1582\ngithub.com/tikv/pd/client.(*client).GetAllStores\n\t/go/pkg/mod/github.com/tikv/pd/client@v0.0.0-20230724080549-de985b8e0afc/client.go:1189\ngithub.com/pingcap/tidb/br/pkg/conn/util.GetAllTiKVStores\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/pkg/conn/util/util.go:39\ngithub.com/pingcap/tidb/br/pkg/conn.GetAllTiKVStoresWithRetry.func1\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/pkg/conn/conn.go:83\ngithub.com/pingcap/tidb/br/pkg/utils.WithRetry\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/pkg/utils/retry.go:56\ngithub.com/pingcap/tidb/br/pkg/conn.GetAllTiKVStoresWithRetry\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/pkg/conn/conn.go:80\ngithub.com/pingcap/tidb/br/pkg/backup.(*Client).BackupRange\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/pkg/backup/client.go:893\ngithub.com/pingcap/tidb/br/pkg/backup.(*Client).BackupRanges.func2\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/pkg/backup/client.go:852\ngithub.com/pingcap/tidb/br/pkg/utils.(*WorkerPool).ApplyOnErrorGroup.func1\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/pkg/utils/worker.go:76\ngolang.org/x/sync/errgroup.(*Group).Go.func1\n\t/go/pkg/mod/golang.org/x/sync@v0.3.0/errgroup/errgroup.go:75\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1598"] [unit-name="range start:74800000000000002e5f69800000000000000300 end:74800000000000002e5f698000000000000003fb"] [error="rpc error: code = Canceled desc = context canceled"] [errorVerbose="rpc error: code = Canceled desc = context canceled\ngithub.com/tikv/pd/client.(*client).respForErr\n\t/go/pkg/mod/github.com/tikv/pd/client@v0.0.0-20230724080549-de985b8e0afc/client.go:1582\ngithub.com/tikv/pd/client.(*client).GetAllStores\n\t/go/pkg/mod/github.com/tikv/pd/client@v0.0.0-20230724080549-de985b8e0afc/client.go:1189\ngithub.com/pingcap/tidb/br/pkg/conn/util.GetAllTiKVStores\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/pkg/conn/util/util.go:39\ngithub.com/pingcap/tidb/br/pkg/conn.GetAllTiKVStoresWithRetry.func1\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/pkg/conn/conn.go:83\ngithub.com/pingcap/tidb/br/pkg/utils.WithRetry\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/pkg/utils/retry.go:56\ngithub.com/pingcap/tidb/br/pkg/conn.GetAllTiKVStoresWithRetry\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/pkg/conn/conn.go:80\ngithub.com/pingcap/tidb/br/pkg/backup.(*Client).BackupRange\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/pkg/backup/client.go:893\ngithub.com/pingcap/tidb/br/pkg/backup.(*Client).BackupRanges.func2\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/pkg/backup/client.go:852\ngithub.com/pingcap/tidb/br/pkg/utils.(*WorkerPool).ApplyOnErrorGroup.func1\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/pkg/utils/worker.go:76\ngolang.org/x/sync/errgroup.(*Group).Go.func1\n\t/go/pkg/mod/golang.org/x/sync@v0.3.0/errgroup/errgroup.go:75\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1598"] [unit-name="range start:7480000000000000345f720000000000000000 end:7480000000000000345f72ffffffffffffffff00"] [error="rpc error: code = Canceled desc = context canceled"] [errorVerbose="rpc error: code = Canceled desc = context canceled\ngithub.com/tikv/pd/client.(*client).respForErr\n\t/go/pkg/mod/github.com/tikv/pd/client@v0.0.0-20230724080549-de985b8e0afc/client.go:1582\ngithub.com/tikv/pd/client.(*client).GetAllStores\n\t/go/pkg/mod/github.com/tikv/pd/client@v0.0.0-20230724080549-de985b8e0afc/client.go:1189\ngithub.com/pingcap/tidb/br/pkg/conn/util.GetAllTiKVStores\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/pkg/conn/util/util.go:39\ngithub.com/pingcap/tidb/br/pkg/conn.GetAllTiKVStoresWithRetry.func1\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/pkg/conn/conn.go:83\ngithub.com/pingcap/tidb/br/pkg/utils.WithRetry\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/pkg/utils/retry.go:56\ngithub.com/pingcap/tidb/br/pkg/conn.GetAllTiKVStoresWithRetry\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/pkg/conn/conn.go:80\ngithub.com/pingcap/tidb/br/pkg/backup.(*Client).BackupRange\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/pkg/backup/client.go:893\ngithub.com/pingcap/tidb/br/pkg/backup.(*Client).BackupRanges.func2\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/pkg/backup/client.go:852\ngithub.com/pingcap/tidb/br/pkg/utils.(*WorkerPool).ApplyOnErrorGroup.func1\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/pkg/utils/worker.go:76\ngolang.org/x/sync/errgroup.(*Group).Go.func1\n\t/go/pkg/mod/golang.org/x/sync@v0.3.0/errgroup/errgroup.go:75\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1598"] [unit-name="range start:7480000000000000365f720000000000000000 end:7480000000000000365f72ffffffffffffffff00"] [error="rpc error: code = Canceled desc = context canceled"] [errorVerbose="rpc error: code = Canceled desc = context canceled\ngithub.com/tikv/pd/client.(*client).respForErr\n\t/go/pkg/mod/github.com/tikv/pd/client@v0.0.0-20230724080549-de985b8e0afc/client.go:1582\ngithub.com/tikv/pd/client.(*client).GetAllStores\n\t/go/pkg/mod/github.com/tikv/pd/client@v0.0.0-20230724080549-de985b8e0afc/client.go:1189\ngithub.com/pingcap/tidb/br/pkg/conn/util.GetAllTiKVStores\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/pkg/conn/util/util.go:39\ngithub.com/pingcap/tidb/br/pkg/conn.GetAllTiKVStoresWithRetry.func1\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/pkg/conn/conn.go:83\ngithub.com/pingcap/tidb/br/pkg/utils.WithRetry\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/pkg/utils/retry.go:56\ngithub.com/pingcap/tidb/br/pkg/conn.GetAllTiKVStoresWithRetry\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/pkg/conn/conn.go:80\ngithub.com/pingcap/tidb/br/pkg/backup.(*Client).BackupRange\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/pkg/backup/client.go:893\ngithub.com/pingcap/tidb/br/pkg/backup.(*Client).BackupRanges.func2\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/pkg/backup/client.go:852\ngithub.com/pingcap/tidb/br/pkg/utils.(*WorkerPool).ApplyOnErrorGroup.func1\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/pkg/utils/worker.go:76\ngolang.org/x/sync/errgroup.(*Group).Go.func1\n\t/go/pkg/mod/golang.org/x/sync@v0.3.0/errgroup/errgroup.go:75\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1598"] [unit-name="range start:7480000000000000105f69800000000000000100 end:7480000000000000105f698000000000000001fb"] [error="rpc error: code = Canceled desc = context canceled"] [errorVerbose="rpc error: code = Canceled desc = context canceled\ngithub.com/tikv/pd/client.(*client).respForErr\n\t/go/pkg/mod/github.com/tikv/pd/client@v0.0.0-20230724080549-de985b8e0afc/client.go:1582\ngithub.com/tikv/pd/client.(*client).GetAllStores\n\t/go/pkg/mod/github.com/tikv/pd/client@v0.0.0-20230724080549-de985b8e0afc/client.go:1189\ngithub.com/pingcap/tidb/br/pkg/conn/util.GetAllTiKVStores\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/pkg/conn/util/util.go:39\ngithub.com/pingcap/tidb/br/pkg/conn.GetAllTiKVStoresWithRetry.func1\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/pkg/conn/conn.go:83\ngithub.com/pingcap/tidb/br/pkg/utils.WithRetry\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/pkg/utils/retry.go:56\ngithub.com/pingcap/tidb/br/pkg/conn.GetAllTiKVStoresWithRetry\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/pkg/conn/conn.go:80\ngithub.com/pingcap/tidb/br/pkg/backup.(*Client).BackupRange\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/pkg/backup/client.go:893\ngithub.com/pingcap/tidb/br/pkg/backup.(*Client).BackupRanges.func2\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/pkg/backup/client.go:852\ngithub.com/pingcap/tidb/br/pkg/utils.(*WorkerPool).ApplyOnErrorGroup.func1\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/pkg/utils/worker.go:76\ngolang.org/x/sync/errgroup.(*Group).Go.func1\n\t/go/pkg/mod/golang.org/x/sync@v0.3.0/errgroup/errgroup.go:75\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1598"] [unit-name="range start:7480000000000000165f720000000000000000 end:7480000000000000165f72ffffffffffffffff00"] [error="rpc error: code = Canceled desc = context canceled"] [errorVerbose="rpc error: code = Canceled desc = context canceled\ngithub.com/tikv/pd/client.(*client).respForErr\n\t/go/pkg/mod/github.com/tikv/pd/client@v0.0.0-20230724080549-de985b8e0afc/client.go:1582\ngithub.com/tikv/pd/client.(*client).GetAllStores\n\t/go/pkg/mod/github.com/tikv/pd/client@v0.0.0-20230724080549-de985b8e0afc/client.go:1
Error: error happen in store 5 at 10.10.10.2:20160: File or directory not found on TiKV Node (store id: 5; Address: 10.10.10.2:20160). work around:please ensure br and tikv nodes share a same storage and the user of br and tikv has same uid.: [BR:KV:ErrKVStorage]tikv storage occur I/O error
通过最后一条输出看到提示文件或目录在tikv节点不存在。
再查看/tmp/br下产生的备份日志:
[2023/09/11 10:55:24.680 +08:00] [ERROR] [push.go:206] [range-sn=0] [error="[BR:KV:ErrKVStorage]tikv storage occur I/O error: File or directory not found on TiKV Node (store id: 5; Address: 10.10.10.2:20160). work around:please ensure br and tikv nodes share a same storage and the user of br and tikv has same uid."] [stack="github.com/pingcap/tidb/br/pkg/backup.(*pushDown).pushBackup\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/pkg/backup/push.go:206\ngithub.com/pingcap/tidb/br/pkg/backup.(*Client).BackupRange\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/pkg/backup/client.go:938\ngithub.com/pingcap/tidb/br/pkg/backup.(*Client).BackupRanges.func2\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/pkg/backup/client.go:852\ngithub.com/pingcap/tidb/br/pkg/utils.(*WorkerPool).ApplyOnErrorGroup.func1\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/pkg/utils/worker.go:76\ngolang.org/x/sync/errgroup.(*Group).Go.func1\n\t/go/pkg/mod/golang.org/x/sync@v0.3.0/errgroup/errgroup.go:75"]
看提示错误是:必须共享相同的存储,且使用br工具备份的用户和运行tikv节点的用户,必须具有相同的uid。
问题解决
看到这种报错的意思,只能搞S3或者NFS共享文件存储了,既然提示没有文件或目录,那我提前创建下呢?
## 三个tikv节点使用br备份用户提前创建/data1/backup目录
[kibana@dbpnew131v data1]$ mkdir /data1/backup
## 再次使用br工具进行备份
[kibana@dbpnew129v data1]$ tiup br backup full --pd 10.10.10.2:2379 --storage "local:///data1/backup"
tiup is checking updates for component br ...
Starting component `br`: /home/kibana/.tiup/components/br/v7.3.0/br backup full --pd 10.10.10.2:2379 --storage local:///data1/backup
Detail BR log in /tmp/br.log.2023-09-11T11.40.17+0800
Full Backup <------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------> 100.00%
Checksum <---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------> 100.00%
[2023/09/11 11:40:24.602 +08:00] [INFO] [collector.go:77] ["Full Backup success summary"] [total-ranges=27] [ranges-succeed=27] [ranges-failed=0] [backup-checksum=569.677625ms] [backup-fast-checksum=9.318469ms] [backup-total-ranges=80] [backup-total-regions=82] [total-take=6.64338341s] [total-kv-size=86.32MB] [average-speed=12.99MB/s] [backup-data-size(after-compressed)=5.027MB] [Size=5026872] [BackupTS=444177742312505345] [total-kv=2098905]
[kibana@dbpnew129v data1]$
竟然成功了!!
问题总结
- 在使用br工具做备份时,如果使用local的方式时,不能只确保备份的目录对启动各个tikv节点用户具有读写权限,还要确保备份指定的目录要实际存在(br节点会自己创建一个777的备份目录);
- 备份日志提示有误导,提示【please ensure br and tikv nodes share a same storage and the user of br and tikv has same uid】与实际表现不对,实际上只是因为备份指定的实际目录没创建而已;
- 文档FAQ中,对使用本地磁盘备份要求【如果 br 工具和 TiKV 位于不同的机器,则需要用户的 UID 相同】,这一点并不是必须的,因为实际我uid不同也是能正常备份的;
- 后面测试即使启动tikv的用户和备份的br用户不通,只要保证目录存在且具有读写权限,也是能正常备份成功的;
一句话,保证备份命令中指定的目录实际存在并使其对tikv具有读写权限(不能只确保父目录,因为备份时并不会实际替咱们创建),不管你用啥用户,不管uid是否一致,都能备份成功!!