问题现象
TiDB 集群 6.5.7,CDC 同步报错 Error 1067: Invalid default value for 'column1'
问题原因
经分析是因为应用端修改时区后执行了受时区影响的 DDL 进而导致同步失败。
排查过程
当上述问题发生时,changefeed 同步会发生异常并中断,并且会接到 resolved ts 和 checkpoint 的延迟告警消息,参考如下告警消息:
接下来,我们在本地环境模拟了上述故障,逐步排查出问题的原因。
接到告警信息后,我们首先需要排查cdc 任务的同步状态,见如下信息:
tiup cdc:v6.5.7 cli changefeed query -s --server=http://xxx:8300/ --changefeed-id=onemirror-reptask
Starting component cdc: /home/tidb/.tiup/components/cdc/v6.5.7/cdc cli changefeed query -s --server=http://xxx:8300/ --changefeed-id=onemirror-reptask
{
"upstream_id": 7429158424764920159,
"namespace": "default",
"id": "onemirror-reptask",
"state": "warning",
"checkpoint_tso": 454306087227359251,
"checkpoint_time": "2024-12-01 16:03:08.703",
"error": {
"time": "2024-12-01T16:26:23.185248266+08:00",
"addr": "xxx:8300",
"code": "CDC:ErrReachMaxTry",
"message": "[CDC:ErrReachMaxTry]reach maximum try: 20, error: Error 1067: Invalid default value for 'end_time': Error 1067: Invalid default value for 'end_time'"
}
}
可以看到 cdc 任务同步报错,报错显示错误的列默认值,接下来我们看下CDC的日志信息:
[2024/12/01 16:15:04.625 +08:00] [WARN] [mysql_ddl_sink.go:128] ["Execute DDL with error, retry later"] [startTs=454306087227359239] [ddl="CREATE TABLE `testtest` (`id` INT PRIMARY KEY,`end_time` TIMESTAMP NOT NULL DEFAULT _UTF8MB4'1970-01-01 00:00:01')"] [namespace=] [changefeed=] [error="Error 1067: Invalid default value for 'end_time'"]
[2024/12/01 16:15:04.625 +08:00] [ERROR] [ddl_sink.go:279] ["Execute DDL failed"] [namespace=default] [changefeed=onemirror-reptask] [DDL="{\"StartTs\":454306087227359239,\"CommitTs\":454306087227359251,\"Query\":\"CREATE TABLE `testtest` (`id` INT PRIMARY KEY,`end_time` TIMESTAMP NOT NULL DEFAULT _UTF8MB4'1970-01-01 00:00:01')\",\"TableInfo\":{\"id\":1189,\"name\":{\"O\":\"testtest\",\"L\":\"testtest\"},\"charset\":\"utf8mb4\",\"collate\":\"utf8mb4_bin\",\"cols\":[{\"id\":1,\"name\":{\"O\":\"id\",\"L\":\"id\"},\"offset\":0,\"origin_default\":null,\"origin_default_bit\":null,\"default\":null,\"default_bit\":null,\"default_is_expr\":false,\"generated_expr_string\":\"\",\"generated_stored\":false,\"dependences\":null,\"type\":{\"Tp\":3,\"Flag\":4099,\"Flen\":11,\"Decimal\":0,\"Charset\":\"binary\",\"Collate\":\"binary\",\"Elems\":null,\"ElemsIsBinaryLit\":null},\"state\":5,\"comment\":\"\",\"hidden\":false,\"change_state_info\":null,\"version\":2},{\"id\":2,\"name\":{\"O\":\"end_time\",\"L\":\"end_time\"},\"offset\":1,\"origin_default\":null,\"origin_default_bit\":null,\"default\":\"1970-01-01 00:00:01\",\"default_bit\":null,\"default_is_expr\":false,\"generated_expr_string\":\"\",\"generated_stored\":false,\"dependences\":null,\"type\":{\"Tp\":7,\"Flag\":129,\"Flen\":19,\"Decimal\":0,\"Charset\":\"binary\",\"Collate\":\"binary\",\"Elems\":null,\"ElemsIsBinaryLit\":null},\"state\":5,\"comment\":\"\",\"hidden\":false,\"change_state_info\":null,\"version\":2}],\"index_info\":[{\"id\":1,\"idx_name\":{\"O\":\"PRIMARY\",\"L\":\"primary\"},\"tbl_name\":{\"O\":\"\",\"L\":\"\"},\"idx_cols\":[{\"name\":{\"O\":\"id\",\"L\":\"id\"},\"offset\":0,\"length\":-1}],\"state\":5,\"backfill_state\":0,\"comment\":\"\",\"index_type\":1,\"is_unique\":true,\"is_primary\":true,\"is_invisible\":false,\"is_global\":false}],\"constraint_info\":null,\"fk_info\":null,\"state\":5,\"pk_is_handle\":false,\"is_common_handle\":false,\"common_handle_version\":0,\"comment\":\"\",\"auto_inc_id\":0,\"auto_id_cache\":0,\"auto_rand_id\":0,\"max_col_id\":2,\"max_idx_id\":1,\"max_fk_id\":0,\"max_cst_id\":0,\"update_timestamp\":454306087227359239,\"ShardRowIDBits\":0,\"max_shard_row_id_bits\":0,\"auto_random_bits\":0,\"auto_random_range_bits\":0,\"pre_split_regions\":0,\"partition\":null,\"compression\":\"\",\"view\":null,\"sequence\":null,\"Lock\":null,\"version\":5,\"tiflash_replica\":null,\"is_columnar\":false,\"temp_table_type\":0,\"cache_table_status\":0,\"policy_ref_info\":null,\"stats_options\":null,\"exchange_partition_info\":null,\"ttl_info\":null,\"SchemaID\":113,\"TableName\":{\"db-name\":\"test\",\"tbl-name\":\"testtest\",\"tbl-id\":1189,\"is-partition\":false},\"Version\":454306087227359251,\"RowColumnsOffset\":{\"1\":0,\"2\":1},\"ColumnsFlag\":{\"1\":27,\"2\":1},\"HandleIndexID\":1,\"IndexColumnsOffset\":[[0]]},\"PreTableInfo\":null,\"Type\":3,\"Done\":{},\"Charset\":\"\",\"Collate\":\"\"}"] [error="[CDC:ErrReachMaxTry]reach maximum try: 20, error: Error 1067: Invalid default value for 'end_time': Error 1067: Invalid default value for 'end_time'"] [errorVerbose="[CDC:ErrReachMaxTry]reach maximum try: 20, error: Error 1067: Invalid default value for 'end_time': Error 1067: Invalid default value for 'end_time'\ngithub.com/pingcap/errors.AddStack\n\tgithub.com/pingcap/errors@v0.11.5-0.20220729040631-518f63d66278/errors.go:174\ngithub.com/pingcap/errors.(*Error).GenWithStackByArgs\n\tgithub.com/pingcap/errors@v0.11.5-0.20220729040631-518f63d66278/normalize.go:164\ngithub.com/pingcap/tiflow/pkg/retry.run\n\tgithub.com/pingcap/tiflow/pkg/retry/retry_with_opt.go:69\ngithub.com/pingcap/tiflow/pkg/retry.Do\n\tgithub.com/pingcap/tiflow/pkg/retry/retry_with_opt.go:34\ngithub.com/pingcap/tiflow/cdc/sinkv2/ddlsink/mysql.(*mysqlDDLSink).execDDLWithMaxRetries\n\tgithub.com/pingcap/tiflow/cdc/sinkv2/ddlsink/mysql/mysql_ddl_sink.go:115\ngithub.com/pingcap/tiflow/cdc/sinkv2/ddlsink/mysql.(*mysqlDDLSink).WriteDDLEvent\n\tgithub.com/pingcap/tiflow/cdc/sinkv2/ddlsink/mysql/mysql_ddl_sink.go:111\ngithub.com/pingcap/tiflow/cdc/owner.(*ddlSinkImpl).writeDDLEvent.func1\n\tgithub.com/pingcap/tiflow/cdc/owner/ddl_sink.go:272\ngithub.com/pingcap/tiflow/cdc/owner.(*ddlSinkImpl).retrySinkAction\n\tgithub.com/pingcap/tiflow/cdc/owner/ddl_sink.go:184\ngithub.com/pingcap/tiflow/cdc/owner.(*ddlSinkImpl).observedRetrySinkAction.func1\n\tgithub.com/pingcap/tiflow/cdc/owner/ddl_sink.go:217\nruntime.goexit\n\truntime/asm_amd64.s:1594"]
从上面报错可以看到 CDC 在执行从上游同步过来的 DDL 时,执行失败,接下来我们通过 dashboard 看下该 DDL 在上游的执行情况:
2024-12-01 16:03:08 (UTC+08:00)
TiDB xxx:4000
[session.go:3807] ["CRUCIAL OPERATION"] [conn=5789377320984773157] [schemaVersion=452] [cur_db=test] [sql="/* ApplicationName=DBeaver 24.2.4 - SQLEditor <Script-4.sql> */ create table testtest(id int primary key,end_time timestamp not null default '1970-01-01 00:00:01')"] [user=root@%]
2024-12-01 16:03:08 (UTC+08:00)
TiDB xxx:4000
[ddl.go:1029] ["[ddl] start DDL job"] [job="ID:1190, Type:create table, State:queueing, SchemaState:none, SchemaID:113, TableID:1189, RowCount:0, ArgLen:2, start time: 2024-12-01 16:03:08.653 +0800 CST, Err:<nil>, ErrCount:0, SnapshotVersion:0"] [query="/* ApplicationName=DBeaver 24.2.4 - SQLEditor <Script-4.sql> */ create table testtest(id int primary key,end_time timestamp not null default '1970-01-01 00:00:01')"]
根据表名称和job id信息,我们可以通过ddl jobs查看该DDL在主集群的执行情况:
mysql> admin show ddl jobs where table_name='testtest';
+--------+---------+------------+--------------+--------------+-----------+----------+-----------+---------------------+---------------------+---------------------+--------+
| JOB_ID | DB_NAME | TABLE_NAME | JOB_TYPE | SCHEMA_STATE | SCHEMA_ID | TABLE_ID | ROW_COUNT | CREATE_TIME | START_TIME | END_TIME | STATE |
+--------+---------+------------+--------------+--------------+-----------+----------+-----------+---------------------+---------------------+---------------------+--------+
| 1190 | test | testtest | create table | public | 113 | 1189 | 0 | 2024-12-01 16:03:08 | 2024-12-01 16:03:08 | 2024-12-01 16:03:08 | synced |
+--------+---------+------------+--------------+--------------+-----------+----------+-----------+---------------------+---------------------+---------------------+--------+
1 row in set (0.01 sec)
mysql> admin show ddl job queries 1190;
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| QUERY |
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| /* ApplicationName=DBeaver 24.2.4 - SQLEditor <Script-4.sql> */ create table testtest(id int primary key,end_time timestamp not null default '1970-01-01 00:00:01') |
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.01 sec)
mysql>
我们在测试环境执行下上述主库的DDL,发现无法执行:
mysql> use test1;
Database changed
mysql> create table testtest(id int primary key,end_time timestamp not null default '1970-01-01 00:00:01');
ERROR 1067 (42000): Invalid default value for 'end_time'
mysql>
在参考并尝试了一些和本问题无关的解决方案后,我们重新查找了官网文档对 TIMESTAMP 类型的定义,发现如下信息:
根据官网说明:TIMESTAMP
类型是日期和时间的组合,支持的范围是 UTC 时间从 1970-01-01 00:00:01.000000
到 2038-01-19 03:14:07.999999
。而本报错信息为:end_time 列默认值是错误的时间戳值。至此我们开始怀疑应用使用了特殊的方式影响了 TIMESTAMP 的值并,创建该表。
根据官网说明:当存储 TIMESTAMP
时,TiDB 会将当前时区的 TIMESTAMP
值转换为 UTC 时区。当读取 TIMESTAMP
时,TiDB 将存储的 TIMESTAMP
值从 UTC 时区转换为当前时区(注意:DATETIME
不会这样处理)。每次连接的默认时区是服务器的本地时区,可以通过环境变量 time_zone
进行修改。TiDB 在存取TIMESTAMP
值时会受到 time_zone 系统变量的影响。
根据官网说明:
在默认情况下,TiDB time_zone 值为系统主机的时区,即默认东八区(UTC+8):
$ timedatectl
Local time: Sun 2024-12-01 16:55:49 CST
Universal time: Sun 2024-12-01 08:55:49 UTC
RTC time: Sun 2024-12-01 08:55:49
Time zone: Asia/Shanghai (CST, +0800)
NTP enabled: yes
NTP synchronized: yes
RTC in local TZ: no
DST active: n/a
$
默认情况下,基于 UTC+8 主机环境的 TiDB 环境,TIMESTAMP
的写入范围是:1970-01-01 08:00:01.000000
到 2038-01-19 11:14:07.999999
。
基于上述3个信息,并结合现有的报错信息,我们可以推测:应用在会话级别将 time_zone 调整为 UTC 时区后,再执行上述 DDL,示例如下:
set time_zone='+00:00';
create table testtest(id int primary key,end_time timestamp not null default '1970-01-01 00:00:01');
下面我们来验证一下:
mysql> set time_zone='+00:00';
Query OK, 0 rows affected (0.00 sec)
mysql> create table testtest(id int primary key,end_time timestamp not null default '1970-01-01 00:00:01');
Query OK, 0 rows affected (0.55 sec)
mysql> show create table testtest\G
*************************** 1. row ***************************
Table: testtest
Create Table: CREATE TABLE `testtest` (
`id` int(11) NOT NULL,
`end_time` timestamp NOT NULL DEFAULT '1970-01-01 00:00:01',
PRIMARY KEY (`id`) /*T![clustered_index] NONCLUSTERED */
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_bin
1 row in set (0.00 sec)
mysql>
mysql> set time_zone='+08:00';
Query OK, 0 rows affected (0.00 sec)
mysql> show create table testtest\G
*************************** 1. row ***************************
Table: testtest
Create Table: CREATE TABLE `testtest` (
`id` int(11) NOT NULL,
`end_time` timestamp NOT NULL DEFAULT '1970-01-01 08:00:01',
PRIMARY KEY (`id`) /*T![clustered_index] NONCLUSTERED */
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_bin
1 row in set (0.00 sec)
mysql>
可以看到:
1 在会话级别调整 time_zone 为 UTC 时区后,该 DDL 可以正常执行。
2 在会话级别调整 time_zone 时区值后,从命令行看到的 TIMESTAMP DEFAULT 值是通过 time_zone 转化后的值。
而 cdc 报错的原因是,cdc 在下游执行时默认是 UTC+8 时区,因此导致 DDL 报错。
接下来,我们联系了应用方,询问是否有上述写入情况,应用反馈确实使用上述的方式创建表。
处理方案
1 处理方式1: 手工在下游执行 DDL 并让 CDC 跳过问题 DDL;
2 处理方式2: 重建 CDC 链路。
后续建议
告知应用方不建议使用 TIMESTAMP 数据类型,建议使用 DATETIME 类型。原因参考 TiDB 官网 TIMESTAMP 类型警告部分说明。