【是否原创】是
【首发渠道】TiDB 社区

背景

在生产测试环境遇到这样一个场景，它的表结构 objects 大致是这样的：

键名	id	bucket_id	name	… …	version_id
类型	bigint(20)	varchar(270)	varchar(1024)	… …	varchar(32)
索引	主键	组合索引: (`bucket_id`, `name`(498))

这个表 objects 有一个联合前缀索引 idx(bucket_id, name(498), version_id)，之所以这么用有以下原因：

所有查询类似 select * from objects where bucket_id = “XXXX” and name > “xxxxy” order by name asc limit n;
由于业务标准属性要求，name 字段需要达到 1024 长度
索引有长度限制，TiDB 默认值以及 MySQL 长度都是 3072 字节（这里是字节数）

而 objects 表数据分布又有一个特点：

bucket_id 数量极少，只有几十到几百个
同样 bucket_id 下，name 又极多，从几十万到几百万，甚至千万级别

关于 name 列长度，原来没有那么大，之前是把它设为普通联合索引 idx(bucket_id, name, version_id)，这时候查询速度非常快，但是增大 name 并调整 idx 为前缀索引后，执行性能突然变慢了。

在不同 bucket_id 下，随着 name 条数增多，执行耗时呈现出线性增长，当达到 661666 条记录并且机器负载很低的情况下，耗时达到 0.49 秒，对业务来说已经太慢了：

bucket.meta.bucket1（总共 661666 条记录）	bucket.meta.bkt20211213（总共 27995202条记录）
5 rows in set (0.49 sec)	5 rows in set (16.35 sec)

分析原因

从普通联合索引，转为前缀联合索引，执行耗时差别之大是出乎意料的，下面看看两种执行计划的差别：

普通联合索引

mysql> explain select * from objects where bucket_id="bucket_id_0" and name >= "aaaaa" order by name asc limit 1\"G
*************************** 1. row ***************************
           id: IndexLookUp_24
      estRows: 1.00
         task: root
access object: 
operator info: limit embedded(offset:0, count:1)
*************************** 2. row ***************************
           id: ├─Limit_23(Build)
      estRows: 1.00
         task: cop[tikv]
access object: 
operator info: offset:0, count:1
*************************** 3. row ***************************
           id: │ └─IndexRangeScan_21
      estRows: 1.00
         task: cop[tikv]
access object: table:objects, index:idx_tt(bucket_id, name, ver_id)
operator info: range:["bucket_id_0" "aaaaa","bucket_id_0" +inf], keep order:true
*************************** 4. row ***************************
           id: └─TableRowIDScan_22(Probe)
      estRows: 1.00
         task: cop[tikv]
access object: table:objects
operator info: keep order:false
4 rows in set (0.00 sec)

这里通过 IndexRangeScan_21 顺序扫描 Limit_23 个 rowId，然后进行回表 TableRowIDScan_22，取出数据后返回，效率非常快。

前缀联合索引

mysql> explain select * from objects where bucket_id="bucket_id_0" and name >= "aaaaa" order by name asc limit 1\"G
*************************** 1. row ***************************
           id: TopN_9
      estRows: 1.00
         task: root
access object: 
operator info: demo_prefix.objects.name, offset:0, count:1
*************************** 2. row ***************************
           id: └─IndexLookUp_21
      estRows: 1.00
         task: root
access object: 
operator info: 
*************************** 3. row ***************************
           id:   ├─IndexRangeScan_17(Build)
      estRows: 10435.77
         task: cop[tikv]
access object: table:objects, index:idx_tt(bucket_id, name, ver_id)
operator info: range:["bucket_id_0" "aaaaa","bucket_id_0" +inf], keep order:false
*************************** 4. row ***************************
           id:   └─TopN_20(Probe)
      estRows: 1.00
         task: cop[tikv]
access object: 
operator info: demo_prefix.objects.name, offset:0, count:1
*************************** 5. row ***************************
           id:     └─Selection_19
      estRows: 10435.77
         task: cop[tikv]
access object: 
operator info: ge(demo_prefix.objects.name, "aaaaa")
*************************** 6. row ***************************
           id:       └─TableRowIDScan_18
      estRows: 10435.77
         task: cop[tikv]
access object: table:objects
operator info: keep order:false
6 rows in set (0.01 sec)

这里执行计划同样选择了 idx 索引走 IndexLookUp_21 流程：

build 端 IndexRangeScan_17 扫描 range:[“bucket_id_0” “aaaaa”,“bucket_id_0” +inf] 范围内所有的 rowId
Probe 端：
- TableRowIDScan_18 将 rowId 进行回表，检索出完整的行记录
- Selection_19 执行 name >= “aaaaa” 的过滤
- TopN_20 按照 name 排序，选出需要返回的 1 行
root 端，TopN_9 汇聚 copTask 所有的 TopN_20 结果，返回最终的结果。

从这个执行计划看，正确性没有任何值得怀疑的地方，但是在我们的数据场景中，执行性能就很低。慢就慢在将 range:[“bucket_id_0” “aaaaa”,“bucket_id_0” +inf] 范围内所有的 rowId 进行了 TableRowIDScan_18 操作，而我们的数据场景，这里可能有几十万到上千万数据记录，所以在记录数达到 27995202 条时候，耗时达到 16.35 sec。

优化思考

普通的联合索引执行计划，在 order by indexColA, indexColB limit N 时候，直接顺序扫描 N 条记录即可。

但是在前缀索引的情况，索引中列不包括全部的列的值，这时候没办法直接按照顺序扫描的 keys 保证有序，所以在当前的执行计划中是进行了所有 rowId 的回表，然后取出完整的列进行 topN 排序返回的。

但是即使在前缀联合索引情况下，索引值在某种意义下，仍然是有序的，比如 idx(a, b(32))：

那么 idx 是按照 a 列严格有序的
对于 b 列，会根据 b 的前 31 个字符严格有序

而对于前缀索引 order by indexColA, indexColB limit N 时候，基于上述有序性，是可以根据数据的情况，确定要查询的数据在一定的范围的。

假设我们有下面一条 sql，而表 t 有个前缀索引 idx(a, b[3]) :

select * from t where a= "a", b>= "dddd" order by a, b asc limit 5;

表 t 的数据如下，col_a 代表 a 列的值，col_b 代表 b 列的值，keyNum 是这里标注的一个序号，代表当前列的值在 idx 索引上的虚拟序号：

keyNum                col_a               col_b
1                     a                   ddda
2                     a                   dddf
3                     a                   dddc
4                     a                   e
5                     a                   f
6                     a                   f
7                     a                   g			
8                     a                   hadf
9                     a                   hadc
10                    a                   hadgs
11                    a                   j
12                    a                   k
13                    a                   l
...                   ...                 ...
...                   ...                 ...

我们的 sql 希望返回的数据行数为 5 条，我们根据上述的有序性，顺序扫描 idx，根据数据情况，可以确定要返回的记录一定位于某个范围内。

满足我们查询结果的数据记录集为：

keyNum                col_a               col_b
2                     a                   dddf
4                     a                   e
5                     a                   f
6                     a                   f
7                     a                   g

我们以 cnt 代表需要扫描的 idx keys，cnt = cn1 + cn2 + cn3，最终扫描返回以下行的 rowId 即可停止，因为后面的行是一定不满足的：

keyNum                col_a               col_b
1                     a                   ddda
2                     a                   dddf
3                     a                   dddc
4                     a                   e
5                     a                   f
6                     a                   f
7                     a                   g			
8                     a                   hadf
9                     a                   hadc
10                    a                   hadgs

cnt = cn1 + cn2 + cn3 计算：

cn1: 因为查询条件 b>= “dddd” 是被截断的，那么对于截断的 “ddd”，相同的记录 rowId 要返回，这里是前三条
cn2: cn1 查询来的记录有可能都满足，有可能都不满足，那么按照都不满足的情况，扫描后面四条
cn3: 由于cn1 + cn2 至少有 4 条是满足的，那么期望再扫描一条，由于 8、9、10 三条前缀都是 “had”，那么把这三条也扫描出来

最后返回的 rowId 个数为 3 + 4 + 3 = 10 个，也就是说我们的查询结果数据集，一定在这 10 个 rowId 的记录内。

数据同上，查询条件为 select * from t where a= “a”, b>= “e” order by a, b asc limit 5;

cn1 + cn2 + cn3 三个值：

cn1：由于 b>=“e” 没有被截断，那么 keyNum 为 4 扫描出来就是满足的，这个记录数为 1.
cn2:为 5、6、7的记录三条
cn3:目前cn1 + cn2 = 4 条，需要再扫描后面连续的具有相同前缀“had”的三条。

最后返回的 rowId 个数为 1 + 3 + 3 = 7 个。

优化尝试

当前的执行计划，因为扫描了 range:[“bucket_id_0” “aaaaa”,“bucket_id_0” +inf] 范围内的 rowId 进行 TableRowIDScan_22 回表导致耗时较长。

在上一节，通过顺序扫描 idx，根据数据集情况，可以提前终止扫描过程，极大的减少了 rowId 数量。

进行一个如下的优化尝试，生成如下的执行计划：

mysql> explain select * from objects where bucket_id="bucket_id_0" and name >= "aaaaa" order by name asc limit 1\"G
*************************** 1. row ***************************
           id: Like_TopN_9
      estRows: 1.00
         task: root
access object: 
operator info: test.objects.name, offset:0, count:1
*************************** 2. row ***************************
           id: └─IndexLookUp_17
      estRows: 33.33
         task: root
access object: 
operator info: 
*************************** 3. row ***************************
           id:   ├─Like_Limit_16(Build)
      estRows: 1.00
         task: cop[tikv]
access object: 
operator info: offset:0, count:1
*************************** 4. row ***************************
           id:   │ └─IndexRangeScan_13
      estRows: 33.33
         task: cop[tikv]
access object: table:objects, index:idx_tt(bucket_id, name, ver_id)
operator info: range:["bucket_id_0" "aaaaa","bucket_id_0" +inf], keep order:true, stats:pseudo
*************************** 5. row ***************************
           id:   └─Selection_15(Probe)
      estRows: 33.33
         task: cop[tikv]
access object: 
operator info: ge(test.objects.name, "aaaaa")
*************************** 6. row ***************************
           id:     └─TableRowIDScan_14
      estRows: 33.33
         task: cop[tikv]
access object: table:objects
operator info: keep order:false, stats:pseudo
6 rows in set (0.02 sec)

执行流程如下：

Build 端：
- IndexRangeScan_13 顺序扫描 idx keys
- Like_Limit_16 根据 idx 的数据集返回 rowIds
Probe 端：
- TableRowIDScan_14 对 rowId 进行回表检索完整的行记录
- Selection_15 用过滤条件 name >= “aaaaa” 过滤数据
root 端执行 Like_TopN_9 提取前 n 条返回

说明：

Like_Limit_16 实现了类似 Limit 下推算子的语法功能，但它会根据扫描出来的值的特征决定是否已经获取足够的数据，判断方法如之前描述。
Like_TopN_9 同样实现类似 TopN 算子的语法功能，他会根据 tikv 返回的记录的值的特征决定是否已经获取足够的数据，判断方法如之前描述。

正确性证明

Like_Limit 算子保证了正确记录的 rowId 一定会被扫描到
Selection 算子将不符合的记录过滤掉
由于 IndexRangeScan 的 keep order:true， IndexLookUp 返回的记录，按照 idx 索引顺序（keyNum ）有序
Like_TopN 收到的记录按照 bucket_id, name(498) 有序，但是不能说按照 bucket_id，name 有序，这里进行一次 heap 排序，使其按照 bucket_id，name 有序，最后返回正确的记录

归纳

上述的执行计划，它可以直接针对以下类型场景：

create table t (id bigint, a varchar(8), b varchar(32), c int, idx(b(16)) );
select * from where b > "xxxx"  order by b limit n;
select * from where b > "xxxx"  order by b limit m, n; 

create table t (id bigint, a varchar(8), b varchar(32), c int, idx(a, b(16)) );
select * from where a = "xxxxx" and b > "xxxx"  order by b limit n;
select * from where a = "xxxxx" and b > "xxxx"  order by b limit m, n; 

create table t (id bigint, a varchar(8), b varchar(32), c varchar(32), idx(a, b, c(16)));
select * from where a = "xxxxx" and b = "xxxx" and c > "xxxx" order by c limit n;
select * from where a = "xxxxx" and b = "xxxx" and c > "xxxx" order by c limit m, n;

特点如下：

索引 index ( a, b, c, d, … , z[k] )
查询条件 select XXX, XXX from t where a=“XXX” and b=“xxx”, c = “xxx”, … , and z > “XXX” limit m, n;

它利用了前缀索引按照 z[k - 1] 绝对有序的特点，将正确的记录数的 rowId 确定在一个范围内，减少 rowId 的扫描，从而达到跟普通索引上执行的效率。

延伸一

create table t (id bigint, a varchar(8), b varchar(32), c varchar(32), idx(a, b, c(16)));
select * from t where a > "xxxx" and  b > "xxx" and c > "xxxx" order by a, b, c limit n;

这个 sql 也是可以的，执行计划在 IndexRangeScan 的 Build 改为：

IndexRangeScan，range:[“xxx” “xxx” “xxxx”, +inf]
Selection 过滤掉 b > “xxx”
Like_Limit 对 a，b, c > “xxxx” 进行判断

create table t (id bigint, a varchar(8), b varchar(32), c varchar(32), idx(a, b(16), c));
select * from t where a > "xxxx" and  b > "xxx" and c > "xxxx" order by a, b, c limit n;

执行计划在 IndexRangeScan 的 Build 改为：

IndexRangeScan，range:[“xxx” “xxx” , +inf]，不带 c 的参数
Selection 过滤掉 c > “xxxx”
Like_Limit 对 a，b(16) 进行评估

延伸二

对于普通索引，差不多也是有效的：

create table t (id bigint, a varchar(8), b varchar(32), c varchar(32), idx(a, b, c));
select * from t where a > "xxxx" and c > "xxxx" order by a, c limit n;

执行计划在 IndexRangeScan 的 Build 改为：

IndexRangeScan，range:[“xxx” , +inf]，不带 c 的参数
Selection 过滤掉 c > “xxxx”
Like_Limit 对 a 评估

create table t (id bigint, a varchar(8), b varchar(32), c varchar(32), idx(a, b));
select * from t where a > "xxxx" order by a, c limit n;

在这里 c 并不是索引列，执行计划在 IndexRangeScan 的 Build 改为：

IndexRangeScan，range:[“xxx” , +inf]，不带 c 的参数
Like_Limit 对 a 评估

create table t (id bigint, a varchar(8), b varchar(32), c varchar(32), primary id);
select * from t where order by id, a limit n;

由于 id 是主键，那么 order by id, a 等价于 order by id，这里其实逻辑优化就可以去掉，这个 sql 本身很奇葩，但是 tidb 也没有很好地优化掉。

执行计划在 IndexRangeScan 的 Build 改为：

IndexRangeScan，range:[-inf , +inf]，不带 c 的参数
Like_Limit 对 id 评估

总结与扩展

select * from t where [col_a condition, con_b condition, col_c condition] order by col_a, col_b, col_c limit m, n

t 有索引 idx (col_a, [col_b, col_c])

这类 sql 有个特点：

特点1，order by 的列和 idx 的列，具有前面公有的、连续的一部分相同列，假设公有部分是 col_common
特点2，where condition 中，只包含 col_common 的列的 condition 条件

说明：col_common 遇到不同的列或者具有前缀索引的列时终止

说明：

col_common 遇到不同的列或者具有前缀索引的列时终止
如果是遇到不同的列，那么此列不属于 col_common
如果遇到前缀索引，那么此列前缀长度属于 col_common

特点 2 的约束是为了通过 IndexRangeScan 检索的结果，能判断某个值一定不属于查询的正确值

对于上面的 sql 特点，如果出现以下情况：

特点 1，col_common 是完整的 order by 的列
where condition 只跟 order by 的列有关

那么 tidb 可以完美的使用 idx 索引，通过 IndexRangeScan 的 keep order:true 高效的完成数据检索。

func (ds *DataSource) isMatchProp(path *util.AccessPath, prop *property.PhysicalProperty) bool {
   var isMatchProp bool
   ...
   all, _ := prop.AllSameOrder()
   // When the prop is empty or `all` is false, `isMatchProp` is better to be `false` because
   // it needs not to keep order for index scan.

   // Basically, if `prop.SortItems` is the prefix of `path.IdxCols`, then `isMatchProp` is true. However, we need to consider
   // the situations when some columns of `path.IdxCols` are evaluated as constant. For example:
   // ```
   // create table t(a int, b int, c int, d int, index idx_a_b_c(a, b, c), index idx_d_c_b_a(d, c, b, a));
   // select * from t where a = 1 order by b, c;
   // select * from t where b = 1 order by a, c;
   // select * from t where d = 1 and b = 2 order by c, a;
   // select * from t where d = 1 and b = 2 order by c, b, a;
   // ```
   // In the first two `SELECT` statements, `idx_a_b_c` matches the sort order. In the last two `SELECT` statements, `idx_d_c_b_a`
   // matches the sort order. Hence, we use `path.ConstCols` to deal with the above situations.
   if !prop.IsEmpty() && all && len(path.IdxCols) >= len(prop.SortItems) {
      isMatchProp = true
      i := 0
      for _, sortItem := range prop.SortItems {
         found := false
         for ; i < len(path.IdxCols); i++ {
            if path.IdxColLens[i] == types.UnspecifiedLength && sortItem.Col.Equal(nil, path.IdxCols[i]) {
               found = true
               i++
               break
            }
            if path.ConstCols == nil || i >= len(path.ConstCols) || !path.ConstCols[i] {
               break
            }
         }
         if !found {
            isMatchProp = false
            break
         }
      }
   }
   return isMatchProp
}

但是对于这种类型 sql 其他的情况，tidb 并没有一个较好的执行计划，尤其对于以下场景，性能较差：

limit m, n; 其中 m+n 值很小
where 条件要过滤的 range 很大（扫描的 keys 特别多）

所以针对这种类型 sql，利用 col_common，通过走 idx 索引的 IndexRangeScan 的 keep order:true，减少 rowId （聚簇索引时候为主键）的扫描，从而减少 TableRowIDScan_的回表，将能极大的提高查询的性能。

通过将 col_common 下推，实现 tikv server 的 Like_Limit 与 tidb server 的 Like_TopN，实现跟 isMatchProp 为 true 时差不多的查询效率。

TiDB Hackathon

我们以此为主题，参加本届 Hackathon，四个非专业内核人员，抱着以学习为主的心态，进行一些简单的摸索与尝试。

队名：摸鱼不

口号：摸鱼不对，向大佬学习才对

队员：

chenyf 电信云-后端开发
liuke 杭州网易-后端开发
zhaoxugang 深圳顺丰-后端开发
jiyf 电信云-后端开发

队长：jiyf

前缀索引在特殊场景下的优化尝试

背景

分析原因

普通联合索引

前缀联合索引

优化思考

优化尝试

正确性证明

归纳

延伸一

延伸二

总结与扩展

TiDB Hackathon