背景
通过DM同步Mysql库到Tidb集群,在一个DM-Worker上启动两个task,一个task同步到3.0.14的集群,另一个task同步到4.0.4的集群,两个集群分别给不同的业务部门使用
问题
4.0集群上的开发人员反馈有一个查询SQLselect id,wo_id,delivery_code,vehicle_order_id FROM insurance_wo where vehicle_order_id = '184742761197489224';
在Tidb上查到的数据不对,少数据了,这个少数据指的是和源头Mysql上的查询结果对比,在Tidb上的查询结果少一条数据。
复现
- 我用上面SQL在源Mysql上查询结果有两条数据:
dba@ma.e.com:3306 [xx_swo] : Fri Sep 18 08:17:43 2020 >select id,wo_id,delivery_code,vehicle_order_id FROM insurance_wo where vehicle_order_id = '184742761197489224';
+-------+--------------------+-----------------------+---------------------+
| id | wo_id | delivery_code | vehicle_order_id |
+-------+--------------------+-----------------------+---------------------+
| 3068 | 476295277372793241 | D00620191015122055373 | 184742761197489224 |
| 15556 | 476959932151783620 | D00620191015122055373 | 184742761197489224 |
+-------+--------------------+-----------------------+---------------------+
2 rows in set (0.00 sec)
- 我用上面SQL在4.0的Tidb上查询确实只查到了一条数据
MySQL [xx_swo]> select id,wo_id,delivery_code,vehicle_order_id FROM insurance_wo where vehicle_order_id = '184742761197489224';
+------+--------------------+-----------------------+--------------------+
| id | wo_id | delivery_code | vehicle_order_id |
+------+--------------------+-----------------------+--------------------+
| 3068 | 476295277372793241 | D00620191015122055373 | 184742761197489224 |
+------+--------------------+-----------------------+--------------------+
1 row in set (0.01 sec)
- 然后我又用上面SQL在3.0的Tidb上查询发现也是只有一条数据
MySQL [xx_swo]> select id,wo_id,delivery_code,vehicle_order_id FROM insurance_wo where vehicle_order_id = '184742761197489224';
+------+--------------------+-----------------------+--------------------+
| id | wo_id | delivery_code | vehicle_order_id |
+------+--------------------+-----------------------+--------------------+
| 3068 | 476295277372793241 | D00620191015122055373 | 184742761197489224 |
+------+--------------------+-----------------------+--------------------+
1 row in set (0.01 sec)
排查
-
第一个想到的是不是经过DM同步中间数据丢了呢,然后我把这个表从Mysql上通过Mysqldump导出来,导入到Tidb用上面SQL查询也是少一条数据,那么排除了DM的问题
-
我把上面信息反馈给PingCAP的同学继续排查,首先通过
use index
对比查询走索引和表查询的情况,发现走索引查询和表查询结果都一样,都是有问题的
- 至此PingCAP的技术支持小伙伴需要后端研发人员介入了,又收集了一下Tidb集群的排序规则
- 用Mysql客户端登录Tidb的时候增加两个参数
--column-type-info
和--comments
执行下面SQL:
MySQL [test]> select id,wo_id,delivery_code,vehicle_order_id, vehicle_order_id = '184742761197489224' FROM insurance_wo where wo_id = '476959932151783620' ;
Field 1: `id`
Catalog: `def`
Database: `test`
Table: `insurance_wo`
Org_table: `insurance_wo`
Type: LONG
Collation: binary (63)
Length: 11
Max_length: 5
Decimals: 0
Flags: NOT_NULL PRI_KEY UNSIGNED AUTO_INCREMENT NUM
Field 2: `wo_id`
Catalog: `def`
Database: `test`
Table: `insurance_wo`
Org_table: `insurance_wo`
Type: VAR_STRING
Collation: utf8_bin (83)
Length: 135
Max_length: 18
Decimals: 0
Flags: NOT_NULL MULTIPLE_KEY NO_DEFAULT_VALUE
Field 3: `delivery_code`
Catalog: `def`
Database: `test`
Table: `insurance_wo`
Org_table: `insurance_wo`
Type: VAR_STRING
Collation: utf8_bin (83)
Length: 96
Max_length: 21
Decimals: 0
Flags: MULTIPLE_KEY
Field 4: `vehicle_order_id`
Catalog: `def`
Database: `test`
Table: `insurance_wo`
Org_table: `insurance_wo`
Type: VAR_STRING
Collation: utf8_bin (83)
Length: 96
Max_length: 19
Decimals: 0
Flags: NOT_NULL MULTIPLE_KEY NO_DEFAULT_VALUE
Field 5: `vehicle_order_id = '184742761197489224'`
Catalog: `def`
Database: ``
Table: ``
Org_table: ``
Type: LONGLONG
Collation: binary (63)
Length: 1
Max_length: 1
Decimals: 0
Flags: NOT_NULL BINARY NUM
+-------+--------------------+-----------------------+---------------------+-----------------------------------------+
| id | wo_id | delivery_code | vehicle_order_id | vehicle_order_id = '184742761197489224' |
+-------+--------------------+-----------------------+---------------------+-----------------------------------------+
| 15556 | 476959932151783620 | D00620191015122055373 | 184742761197489224 | 0 |
+-------+--------------------+-----------------------+---------------------+-----------------------------------------+
1 row in set (0.01 sec)
–column-type-info 在结果中显示元数据信息
–comments 是否在发送到服务器的语句中剥离或保留注释,默认值为跳过注释;在连接Tidb的时候最好加上这个参数,不然有注释的SQL,比如强制走索引,设置会不生效
- 通过下面SQL确认where条件vehicle_order_id这个字段值的长度,发现有长度是19位的值,大部分都是18位的,说明19位的数据有点不正常
dba@ma.e.com:3306 [xx_swo] : Fri Sep 18 14:17:08 2020 >select id,wo_id,delivery_code,vehicle_order_id, vehicle_order_id = '184742761197489224', length(vehicle_order_id) FROM insurance_wo where wo_id = 476959932151783620 ;
+-------+--------------------+-----------------------+---------------------+-----------------------------------------+--------------------------+
| id | wo_id | delivery_code | vehicle_order_id | vehicle_order_id = '184742761197489224' | length(vehicle_order_id) |
+-------+--------------------+-----------------------+---------------------+-----------------------------------------+--------------------------+
| 15556 | 476959932151783620 | D00620191015122055373 | 184742761197489224 | 1 | 19 |
| 15558 | 476959932151783624 | D00520191016155810548 | 184742744017614672 | 0 | 18 |
+-------+--------------------+-----------------------+---------------------+-----------------------------------------+--------------------------+
2 rows in set, 2 warnings (0.02 sec)
- 在Tidb上通过
hex
函数查看vehicle_order_id的十六进制,看看上面两条数据有啥区别,发现在Tidb上少的这条数据的十六进制的值最后多了一个20
,然后在Mysql上查询结果和Tidb上是是一样的。
dba@ma.e.com:3306 [xx_swo] : Fri Sep 18 14:40:53 2020 >select id,wo_id,delivery_code,vehicle_order_id, hex(vehicle_order_id), hex('184742744017614672') FROM insurance_wo where wo_id = '476959932151783624';
+-------+--------------------+-----------------------+--------------------+--------------------------------------+--------------------------------------+
| id | wo_id | delivery_code | vehicle_order_id | hex(vehicle_order_id) | hex('184742744017614672') |
+-------+--------------------+-----------------------+--------------------+--------------------------------------+--------------------------------------+
| 15558 | 476959932151783624 | D00520191016155810548 | 184742744017614672 | 313834373432373434303137363134363732 | 313834373432373434303137363134363732 |
+-------+--------------------+-----------------------+--------------------+--------------------------------------+--------------------------------------+
1 row in set (0.00 sec)
dba@ma.e.com:3306 [xx_swo] : Fri Sep 18 14:40:55 2020 >select id,wo_id,delivery_code,vehicle_order_id, hex(vehicle_order_id), hex('18474 ) FROM insurance_wo where wo_id = '476959932151783620';
+-------+--------------------+-----------------------+---------------------+----------------------------------------+--------------------------------------+
| id | wo_id | delivery_code | vehicle_order_id | hex(vehicle_order_id) | hex('184742761197489224') |
+-------+--------------------+-----------------------+---------------------+----------------------------------------+--------------------------------------+
| 15556 | 476959932151783620 | D00620191015122055373 | 184742761197489224 | 31383437343237363131393734383932323420 | 313834373432373631313937343839323234 |
+-------+--------------------+-----------------------+---------------------+----------------------------------------+--------------------------------------+
- 通过上面的验证,PingCAP的后端技术人员基本推断这条Tidb上查询不到的数据
vehicle_order_id
这个字段内容里面应该是有一个空格,然后通过下面方法验证空格的16进制确实是20
MySQL [test]> select hex('');
+----------+
| hex(' ') |
+----------+
| 20 |
+----------+
1 row in set (0.001 sec)
- OK,至此已经定位到问题的原因是数据内容里面有空格,Tidb和Mysql对空格的处理逻辑不一样,那么如何解决问题呢,有两个方案:
- 数据里面有空格是不正常的,从Mysql源头定位到问题并修正,清洗库里面现在的数据把空格去掉,清洗SQL:
UPDATE `insurance_wo`
SET `vehicle_order_id` = REPLACE (
`vehicle_order_id`,
' ',
''
);
- 新建4.0的Tidb集群并开启新框架的排序规则
new_collations_enabled_on_first_bootstrap
,重新通过DM同步数据到新集群,业务切割到新集群。
新框架的排序规则目前只有在新建集群的时候才能开启,已经跑了业务的集群无法开启,官方后面会支持老集群开启这个功能
新框架下的排序规则文档