1
1
0
0
专栏/.../

TiDB:TiCDC 同步中断案例剖析 —— 时区修改引发的 DDL 问题

 yangzhj  发表于  2024-12-02

问题现象

TiDB 集群 6.5.7,CDC 同步报错 Error 1067: Invalid default value for 'column1'

问题原因

经分析是因为应用端修改时区后执行了受时区影响的 DDL 进而导致同步失败。

排查过程

当上述问题发生时,changefeed 同步会发生异常并中断,并且会接到 resolved ts 和 checkpoint 的延迟告警消息,参考如下告警消息:

接下来,我们在本地环境模拟了上述故障,逐步排查出问题的原因。

接到告警信息后,我们首先需要排查cdc 任务的同步状态,见如下信息:

tiup cdc:v6.5.7 cli changefeed query -s --server=http://xxx:8300/ --changefeed-id=onemirror-reptask
Starting component cdc: /home/tidb/.tiup/components/cdc/v6.5.7/cdc cli changefeed query -s --server=http://xxx:8300/ --changefeed-id=onemirror-reptask
{
  "upstream_id": 7429158424764920159,
  "namespace": "default",
  "id": "onemirror-reptask",
  "state": "warning",
  "checkpoint_tso": 454306087227359251,
  "checkpoint_time": "2024-12-01 16:03:08.703",
  "error": {
    "time": "2024-12-01T16:26:23.185248266+08:00",
    "addr": "xxx:8300",
    "code": "CDC:ErrReachMaxTry",
    "message": "[CDC:ErrReachMaxTry]reach maximum try: 20, error: Error 1067: Invalid default value for 'end_time': Error 1067: Invalid default value for 'end_time'"
  }
}

可以看到 cdc 任务同步报错,报错显示错误的列默认值,接下来我们看下CDC的日志信息:

[2024/12/01 16:15:04.625 +08:00] [WARN] [mysql_ddl_sink.go:128] ["Execute DDL with error, retry later"] [startTs=454306087227359239] [ddl="CREATE TABLE `testtest` (`id` INT PRIMARY KEY,`end_time` TIMESTAMP NOT NULL DEFAULT _UTF8MB4'1970-01-01 00:00:01')"] [namespace=] [changefeed=] [error="Error 1067: Invalid default value for 'end_time'"]
[2024/12/01 16:15:04.625 +08:00] [ERROR] [ddl_sink.go:279] ["Execute DDL failed"] [namespace=default] [changefeed=onemirror-reptask] [DDL="{\"StartTs\":454306087227359239,\"CommitTs\":454306087227359251,\"Query\":\"CREATE TABLE `testtest` (`id` INT PRIMARY KEY,`end_time` TIMESTAMP NOT NULL DEFAULT _UTF8MB4'1970-01-01 00:00:01')\",\"TableInfo\":{\"id\":1189,\"name\":{\"O\":\"testtest\",\"L\":\"testtest\"},\"charset\":\"utf8mb4\",\"collate\":\"utf8mb4_bin\",\"cols\":[{\"id\":1,\"name\":{\"O\":\"id\",\"L\":\"id\"},\"offset\":0,\"origin_default\":null,\"origin_default_bit\":null,\"default\":null,\"default_bit\":null,\"default_is_expr\":false,\"generated_expr_string\":\"\",\"generated_stored\":false,\"dependences\":null,\"type\":{\"Tp\":3,\"Flag\":4099,\"Flen\":11,\"Decimal\":0,\"Charset\":\"binary\",\"Collate\":\"binary\",\"Elems\":null,\"ElemsIsBinaryLit\":null},\"state\":5,\"comment\":\"\",\"hidden\":false,\"change_state_info\":null,\"version\":2},{\"id\":2,\"name\":{\"O\":\"end_time\",\"L\":\"end_time\"},\"offset\":1,\"origin_default\":null,\"origin_default_bit\":null,\"default\":\"1970-01-01 00:00:01\",\"default_bit\":null,\"default_is_expr\":false,\"generated_expr_string\":\"\",\"generated_stored\":false,\"dependences\":null,\"type\":{\"Tp\":7,\"Flag\":129,\"Flen\":19,\"Decimal\":0,\"Charset\":\"binary\",\"Collate\":\"binary\",\"Elems\":null,\"ElemsIsBinaryLit\":null},\"state\":5,\"comment\":\"\",\"hidden\":false,\"change_state_info\":null,\"version\":2}],\"index_info\":[{\"id\":1,\"idx_name\":{\"O\":\"PRIMARY\",\"L\":\"primary\"},\"tbl_name\":{\"O\":\"\",\"L\":\"\"},\"idx_cols\":[{\"name\":{\"O\":\"id\",\"L\":\"id\"},\"offset\":0,\"length\":-1}],\"state\":5,\"backfill_state\":0,\"comment\":\"\",\"index_type\":1,\"is_unique\":true,\"is_primary\":true,\"is_invisible\":false,\"is_global\":false}],\"constraint_info\":null,\"fk_info\":null,\"state\":5,\"pk_is_handle\":false,\"is_common_handle\":false,\"common_handle_version\":0,\"comment\":\"\",\"auto_inc_id\":0,\"auto_id_cache\":0,\"auto_rand_id\":0,\"max_col_id\":2,\"max_idx_id\":1,\"max_fk_id\":0,\"max_cst_id\":0,\"update_timestamp\":454306087227359239,\"ShardRowIDBits\":0,\"max_shard_row_id_bits\":0,\"auto_random_bits\":0,\"auto_random_range_bits\":0,\"pre_split_regions\":0,\"partition\":null,\"compression\":\"\",\"view\":null,\"sequence\":null,\"Lock\":null,\"version\":5,\"tiflash_replica\":null,\"is_columnar\":false,\"temp_table_type\":0,\"cache_table_status\":0,\"policy_ref_info\":null,\"stats_options\":null,\"exchange_partition_info\":null,\"ttl_info\":null,\"SchemaID\":113,\"TableName\":{\"db-name\":\"test\",\"tbl-name\":\"testtest\",\"tbl-id\":1189,\"is-partition\":false},\"Version\":454306087227359251,\"RowColumnsOffset\":{\"1\":0,\"2\":1},\"ColumnsFlag\":{\"1\":27,\"2\":1},\"HandleIndexID\":1,\"IndexColumnsOffset\":[[0]]},\"PreTableInfo\":null,\"Type\":3,\"Done\":{},\"Charset\":\"\",\"Collate\":\"\"}"] [error="[CDC:ErrReachMaxTry]reach maximum try: 20, error: Error 1067: Invalid default value for 'end_time': Error 1067: Invalid default value for 'end_time'"] [errorVerbose="[CDC:ErrReachMaxTry]reach maximum try: 20, error: Error 1067: Invalid default value for 'end_time': Error 1067: Invalid default value for 'end_time'\ngithub.com/pingcap/errors.AddStack\n\tgithub.com/pingcap/errors@v0.11.5-0.20220729040631-518f63d66278/errors.go:174\ngithub.com/pingcap/errors.(*Error).GenWithStackByArgs\n\tgithub.com/pingcap/errors@v0.11.5-0.20220729040631-518f63d66278/normalize.go:164\ngithub.com/pingcap/tiflow/pkg/retry.run\n\tgithub.com/pingcap/tiflow/pkg/retry/retry_with_opt.go:69\ngithub.com/pingcap/tiflow/pkg/retry.Do\n\tgithub.com/pingcap/tiflow/pkg/retry/retry_with_opt.go:34\ngithub.com/pingcap/tiflow/cdc/sinkv2/ddlsink/mysql.(*mysqlDDLSink).execDDLWithMaxRetries\n\tgithub.com/pingcap/tiflow/cdc/sinkv2/ddlsink/mysql/mysql_ddl_sink.go:115\ngithub.com/pingcap/tiflow/cdc/sinkv2/ddlsink/mysql.(*mysqlDDLSink).WriteDDLEvent\n\tgithub.com/pingcap/tiflow/cdc/sinkv2/ddlsink/mysql/mysql_ddl_sink.go:111\ngithub.com/pingcap/tiflow/cdc/owner.(*ddlSinkImpl).writeDDLEvent.func1\n\tgithub.com/pingcap/tiflow/cdc/owner/ddl_sink.go:272\ngithub.com/pingcap/tiflow/cdc/owner.(*ddlSinkImpl).retrySinkAction\n\tgithub.com/pingcap/tiflow/cdc/owner/ddl_sink.go:184\ngithub.com/pingcap/tiflow/cdc/owner.(*ddlSinkImpl).observedRetrySinkAction.func1\n\tgithub.com/pingcap/tiflow/cdc/owner/ddl_sink.go:217\nruntime.goexit\n\truntime/asm_amd64.s:1594"]

从上面报错可以看到 CDC 在执行从上游同步过来的 DDL 时,执行失败,接下来我们通过 dashboard 看下该 DDL 在上游的执行情况:

2024-12-01 16:03:08 (UTC+08:00)
TiDB xxx:4000
[session.go:3807] ["CRUCIAL OPERATION"] [conn=5789377320984773157] [schemaVersion=452] [cur_db=test] [sql="/* ApplicationName=DBeaver 24.2.4 - SQLEditor <Script-4.sql> */ create table testtest(id int primary key,end_time timestamp not null default '1970-01-01 00:00:01')"] [user=root@%]
2024-12-01 16:03:08 (UTC+08:00)
TiDB xxx:4000
[ddl.go:1029] ["[ddl] start DDL job"] [job="ID:1190, Type:create table, State:queueing, SchemaState:none, SchemaID:113, TableID:1189, RowCount:0, ArgLen:2, start time: 2024-12-01 16:03:08.653 +0800 CST, Err:<nil>, ErrCount:0, SnapshotVersion:0"] [query="/* ApplicationName=DBeaver 24.2.4 - SQLEditor <Script-4.sql> */ create table testtest(id int primary key,end_time timestamp not null default '1970-01-01 00:00:01')"]

根据表名称和job id信息,我们可以通过ddl jobs查看该DDL在主集群的执行情况:

mysql> admin show ddl jobs where table_name='testtest';
+--------+---------+------------+--------------+--------------+-----------+----------+-----------+---------------------+---------------------+---------------------+--------+
| JOB_ID | DB_NAME | TABLE_NAME | JOB_TYPE     | SCHEMA_STATE | SCHEMA_ID | TABLE_ID | ROW_COUNT | CREATE_TIME         | START_TIME          | END_TIME            | STATE  |
+--------+---------+------------+--------------+--------------+-----------+----------+-----------+---------------------+---------------------+---------------------+--------+
|   1190 | test    | testtest   | create table | public       |       113 |     1189 |         0 | 2024-12-01 16:03:08 | 2024-12-01 16:03:08 | 2024-12-01 16:03:08 | synced |
+--------+---------+------------+--------------+--------------+-----------+----------+-----------+---------------------+---------------------+---------------------+--------+
1 row in set (0.01 sec)

mysql> admin show ddl job queries 1190;
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| QUERY                                                                                                                                                               |
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| /* ApplicationName=DBeaver 24.2.4 - SQLEditor <Script-4.sql> */ create table testtest(id int primary key,end_time timestamp not null default '1970-01-01 00:00:01') |
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.01 sec)

mysql> 

我们在测试环境执行下上述主库的DDL,发现无法执行:

mysql> use test1;
Database changed
mysql> create table testtest(id int primary key,end_time timestamp not null default '1970-01-01 00:00:01');
ERROR 1067 (42000): Invalid default value for 'end_time'
mysql> 

在参考并尝试了一些和本问题无关的解决方案后,我们重新查找了官网文档对 TIMESTAMP 类型的定义,发现如下信息:

根据官网说明:TIMESTAMP 类型是日期和时间的组合,支持的范围是 UTC 时间从 1970-01-01 00:00:01.0000002038-01-19 03:14:07.999999。而本报错信息为:end_time 列默认值是错误的时间戳值。至此我们开始怀疑应用使用了特殊的方式影响了 TIMESTAMP 的值并,创建该表。

根据官网说明:当存储 TIMESTAMP 时,TiDB 会将当前时区的 TIMESTAMP 值转换为 UTC 时区。当读取 TIMESTAMP 时,TiDB 将存储的 TIMESTAMP 值从 UTC 时区转换为当前时区(注意:DATETIME 不会这样处理)。每次连接的默认时区是服务器的本地时区,可以通过环境变量 time_zone 进行修改。TiDB 在存取TIMESTAMP 值时会受到 time_zone 系统变量的影响。

根据官网说明:

在默认情况下,TiDB time_zone 值为系统主机的时区,即默认东八区(UTC+8):

$ timedatectl
      Local time: Sun 2024-12-01 16:55:49 CST
  Universal time: Sun 2024-12-01 08:55:49 UTC
        RTC time: Sun 2024-12-01 08:55:49
       Time zone: Asia/Shanghai (CST, +0800)
     NTP enabled: yes
NTP synchronized: yes
 RTC in local TZ: no
      DST active: n/a
$ 

默认情况下,基于 UTC+8 主机环境的 TiDB 环境,TIMESTAMP的写入范围是:1970-01-01 08:00:01.0000002038-01-19 11:14:07.999999

基于上述3个信息,并结合现有的报错信息,我们可以推测:应用在会话级别将 time_zone 调整为 UTC 时区后,再执行上述 DDL,示例如下:

set time_zone='+00:00';
create table testtest(id int primary key,end_time timestamp not null default '1970-01-01 00:00:01');

下面我们来验证一下:

mysql> set time_zone='+00:00';
Query OK, 0 rows affected (0.00 sec)

mysql> create table testtest(id int primary key,end_time timestamp not null default '1970-01-01 00:00:01');
Query OK, 0 rows affected (0.55 sec)

mysql> show create table testtest\G
*************************** 1. row ***************************
       Table: testtest
Create Table: CREATE TABLE `testtest` (
  `id` int(11) NOT NULL,
  `end_time` timestamp NOT NULL DEFAULT '1970-01-01 00:00:01',
  PRIMARY KEY (`id`) /*T![clustered_index] NONCLUSTERED */
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_bin
1 row in set (0.00 sec)

mysql> 
mysql> set time_zone='+08:00';
Query OK, 0 rows affected (0.00 sec)

mysql> show create table testtest\G
*************************** 1. row ***************************
       Table: testtest
Create Table: CREATE TABLE `testtest` (
  `id` int(11) NOT NULL,
  `end_time` timestamp NOT NULL DEFAULT '1970-01-01 08:00:01',
  PRIMARY KEY (`id`) /*T![clustered_index] NONCLUSTERED */
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_bin
1 row in set (0.00 sec)

mysql> 

可以看到:

  1 在会话级别调整 time_zone 为 UTC 时区后,该 DDL 可以正常执行。

  2 在会话级别调整 time_zone 时区值后,从命令行看到的 TIMESTAMP DEFAULT 值是通过 time_zone 转化后的值。

而 cdc 报错的原因是,cdc 在下游执行时默认是 UTC+8 时区,因此导致 DDL 报错。

接下来,我们联系了应用方,询问是否有上述写入情况,应用反馈确实使用上述的方式创建表。

处理方案

1 处理方式1: 手工在下游执行 DDL 并让 CDC 跳过问题 DDL;

2 处理方式2: 重建 CDC 链路。

后续建议

告知应用方不建议使用 TIMESTAMP 数据类型,建议使用 DATETIME 类型。原因参考 TiDB 官网 TIMESTAMP 类型警告部分说明。

1
1
0
0

版权声明:本文为 TiDB 社区用户原创文章,遵循 CC BY-NC-SA 4.0 版权协议,转载请附上原文出处链接和本声明。

评论
暂无评论